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ABSTRACT. We consider nonparametric estimation of cure-rate based on mix- 
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ture model under Case-1 interval censoring. We show that the nonparametric 
maximum-likelihood estimator (NPMLE) of cure-rate is non-unique as well as in- 
consistent, and propose two estimators based on the NPMLE of the distribution 
function under this censoring model. We present a cross-validation method for 

>• ' choosing a 'cut-off' point needed for the estimators. The limiting distributions 

00 ' 

of the latter are obtained using extreme-value theory. Graphical illustration of 
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o 

B 

> 

X 
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1. Introduction 

Consider a sample of individuals on each of whom some sort of time-to- event data is 
being collected, for instance, onset time of a disease following exposure to infection, time to 
death under a terminal disease, time (for criminals) to re-offend after at least one offence etc. 
In most such cases, there may be a possibility that the individual may be immune (e.g., not 
catch a disease) or get cured (e.g., cured of a disease or not re-offend). This is all the more 
relevant when the data is subject to some kind of 'open-ended' censoring such as random 
censoring, double censoring or interval censoring, where an individual being censored (i.e., 
event not occurred), especially after a large amount of time, points to the possibility of cure. 
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the procedures based on simulated data are provided. 

Key-words: Case-1 interval censoring, cross-validation, cure-rate, extreme-value theory, non- 
homogeneous Poisson process, nonparametric maximum-likelihood estimator, strong approx- 
imation, variance-bias trade-off. 



In the literature, the term long-term survival has also been used for cure. 

Cure is usually quantified by the probability of cure, or the cure-rate: p = P{X = oo}, 
where X is the time-to-event of interest. Most of the statistical literature on cure is based on 
one of the two following models for the 'improper' random variable X: the mixture model, in 
which P{X > t} = p + (1 — p)S (t), S'o(-) being a proper survival function representing the 
finite part of X (Berkson and Gage, 1952); and the bounded cumulative hazard (BCH) model, 
in which P{X > t} = exp(— J Q * h(s)ds), with 9 : = J °° h(s)ds < oo, so that p = exp(—9) 
(see Tsodikov et al (2003) for an excellent review). Inference, with or without (random) 
censoring, has been based mostly on either the Bayesian approach (see Yin and Ibrahim 
(2005) and the references therein) or a semi-parametric approach (see Zhao and Zhou (2006) 
and the references therein). 

From the non-parametric point of view, it is clear that the two models above are equiv- 
alent. Notable among the nonparametric approaches are: Laska and Meisner (1992), who 
consider the NPMLE of p under random censoring when a number m > 1 of cures are known; 
Mailer and Zhou (1996), who consider the value of the Kaplan-Meier distribution function at 
the largest datum as an estimator of (1 — p) (as is well-known, the value is less than unity if 
the largest datum is censored — an indication of cure). See Section 2 for more comments on 
these two works. Another interesting paper is Betensky and Schoenfeld (2001), who consider 
a time-to-cure, rather than just possibility of cure, competing with time-to-event/censoring. 

In this paper we study estimation of cure-rate under Case-1 interval censoring, or current- 
status data, using the mixture model. We have been able to trace only one paper so far under 
this set-up, namely Lam and Xue (2005), who work with a semi-parametric model, allowing 
the cure-rate to depend on covariates via a logit function. We consider only the parameters 
(F,p), the time-to-event distribution function and the cure-rate, respectively. Of course, this 
is a semi-parametric model too, but one without covariates. We show that the Mailer-Zhou 
idea does not work here and propose two estimators of p based on the usual (i.e., when 
p = 0) NPMLE of F, as given by Groeneboom and Wellner (1992). The asymptotics of the 
estimators are obtained using extreme-value theory. 

In Section 2, we describe the Case-1 interval censoring model with cure-rate and show 
that the NPMLE of p is non-unique and inconsistent. We then propose the two estimators 
that depend on a 'cut-off' point. Section 3 shows how to make an optimal choice of this cut- 
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off point, because it involves a variance-bias trade-off as in extremal index estimation (see, 
for instance, Embrechts et al. (1997)). In Section 4, limiting distributions of the estimators 
are derived. Use of the latter to construct confidence intervals for p is straightforward. 



2. Model, preliminary results and estimators 

Consider a variable of interest X, say X = time to development of cancer following 
exposure to radiation and an observation time Y, say Y = time of check-up. Under Case-1 
interval censoring model, one observes the so-called 'current status' data 

(Si, Yi), i = 1, 2, . . . , n, where Si = I(Xi < Yi), 

and Yi, . . . , Y n are iid with distribution G, independent of X 1 , . . . , X n which are iid with dis- 
tribution F. Suppose we want to estimate F(x) = P{X < x}. The nonparametric maximum 
likelihood estimator (NPMLE) is obtained by solving: 

maxL(Fi, ...,F n ) 

subject to < Fi < . . . < F n < 1, (1) 



where 



L(F U ..., F n ) = log(Fj) + (1 - %]) log(l - Fi)), 



and Fi = F(Y^), Y^ : order-statistics for (Yi, . . . , Y n ), 6$ = concomitant of Y^, 1 < i < n. 
Solution is given by the 1 ' max-mirC formula of Groeneboom and Wellner (1992), namely, 



..mi - — — . 

h<i k>i k — h + 1 



rj = maxmm- — — . (2) 



Cure-rate. Consider again X = time to cancer, this time with possibility of no cancer = 
cure. Then X can be modelled as an 'extended' real- valued random variable with a defective 
distribution, i.e., 

P(X = oo) = p = cure-rate > 

so that P(X < t) = F p (t) = (1 - p)F(t) and P(X > t) = S p (t) = p + (1 - p)(l - 
F(t)) = V + (1 — P)S(t). In this case the likelihood function in Eq.(l) has to be modified as 
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max L c (p, Fi, . . . , F n ) where 

L c (p,F 1? ...,F n ) 

n 

= Yslh] M(l - P) F i) + (1 " %]) log(p + (1 - p)(l - Pi))] 
i=i 

sufy'ec* to < p < 1, < Fi < . . . < F n < 1 

n 

= ^[4, tog(Fi) + (1-^)10^1-^)] 

1=1 

siify'ec* to < p < 1, < Fi < . . . < F n < (1 - p), (3) 
writing Fj for (1 — p)Fj in the last equality. 

Failure of NPMLE. We state the following theorem whose proof is omitted because it is 
long and technical: 

Theorem 1. Let L c (p) = max < Fl <...< F „<(i_ p ) L c (p, F u ..., F n ). Then 

L c (p) = L(F 1 A(l-p),...,F n A(l-p)), 

where A denotes 'minimum' and F i; 1 < i < n, are as in Eq.(2). 

This leads to the following two observations about the NPMLE of p: 

Remark 1: Non-uniqueness of NPMLE. Obviously, L c (p) is non-increasing in < p < 
1, and 

sup L c (p) = L(F 1 ,...,F n )=L c (p), 

0<p<l 

for any < p < (1 — F n ). Hence p is unique if and only if (1 — F n ) = = p. This was 
also observed, in the case of random censoring, by Laska and Meisner (1992), who showed 
that NPMLE was unique and positive if, however, some number m > 1 of cases of cure were 
known. We shall explore this situation in a future paper. 

Remark 2: Non-consistency of NPMLE. Note that by Eq.(2), 



F„ = max 



i<n n — % + 1 ' 

so that F n — 1 if and only if 5[ n ] = 1. Thus for < p < 1 and any < e < p, 
P{\F n - (1 -p)\ > e} > P{F n = 1} = P{S [n] = 1} = (1 -p)E(F(Y {n) )) - (1 -p)F(r G ), 
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where tq = sup{y\G(y) = 1}. Hence F n is not a consistent estimator of (1 —p). This is in 
stark contrast to the case of random censoring where the former was shown to be in fact 
•\/n-consistent (asymptotically normal) by Mailer and Zhou (1996). 

The proposed estimators. Let us look at 



max — = max 



i<n n-i + 1 x<Y (n) Y™ =1 H Y j > x ) 

Thus F n is the maximum of the tail-averages of the concomitants, 5[^, 1 < % < n. Hence 
consider the ratio empirical process 

Pin(x) := E n =iI{Yj > x) - Pi(x) := (1 - P)j^ 
almost surely for each x > as n — > oo. Moreover, note that 

Pi(x) | (1 — p) as x | oo 

and 

P2n{x) := maxj9i„(y) -> (1 - p) max y rQO | (1 - p) as x t oo 

J y 

These observations lead us to the following: 
Estimator- 1. Define 

Pin = Pln{X n ) 



e; =1 /(^>x„) ' 

i.e., tail-average at a suitable sequence x n j oo of 'cut-off' points. 

Figure 1 gives a sample-plot ofpi„(i) = pin^z)) = Sj=i $[j}/(, n ~ ^+ 1) against 1 < i < n, 
for p = 0.3, n = 100. It is seen that for % xs 55, pi n (i) ~ 0.7 = (1 — p). For comparison, a 
sample-plot for another sample with p = (i.e., no cure) is also given. 



Estimator- 2. Define 



f>2n = P2n(x n ) = HiaXj9i n (w), 



i.e., partial maximum of the tail-averages (rather than the global maximum F n which is 
inconsistent). 

Figure 2 gives a sample-plot of p^ n {i) = max fc <j YTj=k <%]/ (n — i + 1) against 1 < i < n, 
for the same sample as in Figure 1. P2n(") looks more stable than pi n (-), as is to be expected. 
The choice of x n for a given sample of size n is discussed in the next section. 
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3. Choice of cut-off point 

Consider 

Pln -(1-p) 
= (Pln - Pl(x n )) + (Pl(x n ) ~ (1 ~ p)) 

T^iVhW > x n ) ~ (1 ~ P)(JZ FdG/G{x n ))I{Y > x n )] Q{x n ) 

nG(x n ) n~ l YTj=i H Y j > x n) 

POO 

-(1-p) / (l-F)dG/G(x n ) 

Jx n 

= A n (x n )C n (x n ) - B n (x n ), say, (4) 
where G(x) = / x °° dG = 1 - G(ar). Now 

/'OO poo 

nG(x n )(v a r A n (x n )) = (1-p) / FdG/G(x n )-[(l-p)( FdG/G(x n ))f —> p(l—p) (5) 

J X n J X n 

as x ri oo . 

Further, C n (x n ) = Op(l) (see Shorack and Wellner (1986), p. 415) and B n (x n ) = o(l) as 
x„ — > oo. Hence from Eq.(4), 

p ln - (1 - p) = (p ln - Pl (x n )) + ( Pl (x n ) - (1 - p)) = P ((nG(x n )y 1 ' 2 ) + o(l), 

Variance-bias trade-off. Thus we have the following trade-off: as n — > oo, we must have 
x„ | oo (so that the bias —B n (x n ) — > and also G(x n ) — > 0), but slowly enough so that 
nG(x n ) oo (i.e., var (j4„(i n )) — > 0). A similar phenomenon occurs in the case of the Hill 
estimator of extremal index in extreme value theory (see Embrechts et al, 1997, p. 341). 

In view of Eq.(4)-(5), optimal order of x n j oo could be determined by minimizing, with 
respect to x, the function 



M, 



POO 

,(x) = (p(l-p)/nG(x)) + (1 -pf( / (1 - F)dG/G(x)) 2 . 

J X 



Example 1. Let F, G be Exponential (A) and Exponential (//) distributions, respectively, 
i.e., F(x) = 1 — F(x) = exp(— Xx), G(x) = 1 — G(x) = exp(— fix). Then we have 

M n (x) 

POO 

= ( p (l- p )/nG(x)) + (l-p) 2 ( (1 - F)dG/G(x)) 2 

J X 

= n _1 p(l -p)exp(/ix) + ((1 -p)/i/(A + //)) 2 exp(-2Ax), 



and (d/dx)(M n (x)) = gives 

n _1 p(l — p)fiexp(fix) = ((1 —p)/jl/(X + /z)) 2 2A exp(— 2Ax), 



or 

x n = (/i + 2A)- 1 log (((1 - p)fi/2p\(\ + /i) 2 )n) . 

Thus nG(x n ) = c(p, A, /i)n 2A ^ M+2A \ which shows that the optimal rate of convergence, 
(nG(x n )) l l 2 = 0(n A /^ +2A )), is much slower than y/E. 

Cross-validation. Eq.(4)-(5) also suggest that we could make a data-driven choice of x n , 
say x n , as the minimizer of 

M n (x) := var (A n (x)) + B 2 n (x) 

with respect to x, where var (A n (x)) and B n (x) denote suitable estimators of var (A n (x)) 
and B n (x), respectively. 

Now an obvious choice of var (A n (x)) is 

« {Mx)) = (6) 

where we have used P2n( - ) in view of its stability, as is evident from Figure-2. The choice 
of B n (x), however, is not clear in general. Let us therefore consider the special case of the 
Koziol-Green model of censoring: 

Assumption A.l. 1 - F(x) = (1 - G(x)) a for some a > 0. 
Under A.l, we have 

B n (x) = -(l-p)(l-G(x)r/(a + l) (7) 
E(l-8) = p+(l-p)/(a + l) 
whence (1 -p)/(a + l) = E(l - 5) - p (8) 
and a = E{5)/[E{1 - 5) - p\. (9) 

We then replace E(5) by S n := n -1 YH=i ^ an d (1 ~~ P) 

n „ 

P2n := ^ _1 ^P2n(^) = / P2n (x)dG n (x) , (10) 
i=l ^ 
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where G n {-) is the empirical distribution function of Yi, . . . , Y n . This is motivated as follows: 
for y > 0, 

f 00 P2n (x)dG n (x) FdG/G(x))dG(x) 

Jy £ n{y) - (1 " P) Jy Jx Q [ y) = (1 - p)[l - (a + 1)~ 2 (1 - G(y)n 

which has bias of a smaller order than P2n(y)'i to a first approximation, we let y — to get 

p2n- 

Thus by Eq.(6)-(10), we arrive at the following cross-validation function: 

2a 

(11) 



M n {x)- E « =i/( y. > x) +(Pm *n) 



n 



where a = S n /(p2 n — S n ), which could be minimized with respect to x to obtain x n . 

In general, motivated by Eq.(10) we could estimate the bias, B n {x) = (1— p) FdG/G(x) — 
(1 — p), by B n (x) := P2n{x) — P2n- This leads to another cross-validation function 



T 2l \ Pln(x)(l -Pln(x)) 



M " (x) = E - i W> V + iP2n(x) " p2n) (12) 

Figure 3 gives sample-plots of M l n (i) = M^(Y(j)), Z = 1,2. Both the curves exhibit clear 
convex shapes with unique minima. However, M^(-) shows a spurious minimum at the 
upper extreme, which must be discarded. Further, the respective minimizers are seen to 
underestimate (1 — p), so there appears to be scope for improvement here. 
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Figure 3: sample-plot of M n (i) vs. i: p = 0.3, F = Exp (2), G = Exp (1), n = 100, 
Mn(') (solid line: minimizer i\ = 58, ignoring i = 100, £>2n(58) = 0.651), M^(-) (broken line: 
minimizer %2 = 47, P2n(47) = 0.611) 



4. Limiting distributions. 

Eq.(5) suggests that ]3 lri would require a random norming, namely (Y^=i I(Yj — x n)) 1 ^ 2 , 
for asymptotic normality. We establish this, as well as the limiting distribution of p2 n , using 
the asymptotic theory of sample extremes. To this end, assume 
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Assumption A. 2. G(-) belongs to the maximum domain of attraction of an extreme-value 
distribution G e (-), i.e., there exist sequences of constants a n > 0, b n , n > 1, such that 
G n (a n x + b n ) — > G e (x), or equivalently nG(a n x + b n ) — > — log(G e (x)), as n — > oo, for each 

x eM. 

It is well-known that, under A. 2, YTj=i H Y j — a n x + b n ) converges weakly to a non- 
homogeneous Poisson process with mean-function A (a;) = — logG e (x). It turns out that 
YTj=i fijI(Xj — a n x + b n ) converges to an (independently) thinned version of this process. 

Lemma 1. With —> denoting weak convergence in the space D(M) of right-continuous 
functions on M with left-limits, we have, as n — > oo, 

(a) N n (x) := Y^j=i 1 0^3 — a n x + b n ) —> N(x) = N([x, oo)), a Poisson process with mean 
A (:r) = -\ogG e (x); 

(b) (N ln (x),N 0n (x)) ^ (N^x), N (x)), where N ln (x) := YT j= ^A Y i > a n x+b n ) , N 0n (x) := 
X^=i(l — — a n x + b n ), and A r 1 (x), N (x) are independent Poisson processes with 
mean-functions (J,i(x) := (1 —p)A(x), fio(x) := pA(x) respectively. 

(c) Further, Ni(x) = Ylf=i^ Vj? an d N (x) = ^2f=i\l— where (771, 772, • • •) are iid Bernoulli (1— 
p), independent of AT(-), and iV(-) is the Poisson process defined in Part (a) above. 

Proof: 

(a) This is a classical result. For a proof see, for instance, Embrechts et al. (1997). 

(b) First, consider weak convergence of N ln (x) alone. It is enough to verify convergence of 
the finite-dimensional distributions (iVi n (xi), . . . , N ln (xk)), k > 1 (see, for instance, Karr 
(1991), Theorem 1.21, p. 14). For the sake of convenience let us consider just two points, 
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(xi,x 2 ) with x\ < x 2 . Then with i 



-1 and any real numbers ti,t 2 , 



E\^{it x N Xn {x x ) + it 2 N ln (x 2 ))] 

(E[exp(5 1 {it 1 I(Y 1 > a n x x + b n ) + it 2 I(Y 1 > a n x 2 + b n )})]) n 

poo ra n xi+b n pa n X2+b n 

(p+(l-p) (1 - F)dG + (1 - p) / FdG) + e itl (l-p) 

Jo Jo J a n xi+b n 



FdG 



+e 



it\+it2 



(1-p) 



FdG 



a„x 2 +b 

n 



rCLnX2 + bn 

1 + n- 1 nG(a n x 2 + b n ) { (1 - p)(e Ul -I) FdG/G(a n x 2 + b n ) 

J a n xi+b n 



poo 

+ (1 - p)(e u ^ -I) FdG/G{a n x 2 + b n ) 

J a n X2+b n 

exp ((1 -p)A(x 2 ) {(e^ - ^(A^OA" 1 ^) - 1) + (e 



U1+U2 



whence the result. Note that here we have used the fact that as n — > 00, (a n x + b n ) — > r G , 
so that f™ x+b FdG /G(a n x + fo n ) — > 1. The joint weak convergence of (N ln (x), N 0n (x)), as 
well as their asymptotic independence, follow by exactly similar arguments, 
(c) The representations of (Ni(x), N (x)) are obvious. n 
Next note that 



3=1 3=1 



(13) 



where x' n = (x n — b n )/a n . Therefore, in addition to the weak convergence in Lemma 1, we 
need strong approximation by a Poisson process. This follows in a straightforward way from 
Einmahl (1997) and is stated below: 

Theorem 2. Under A. 2, on some probability space one can construct the random variables 
(5i, Yj), % — 1, 2, . . . , and a sequence of Poisson processes N' n = (N' ln , N' Qn ) on 1R x 1R, where 
for each n > 1, N[ n , Nq h are independent with mean-functions /ii(x), tio(x), respectively, 
such that as n — > 00, 

^Px:0<G e (x)<l \Nln(x) - N[ n (x)\ ^ 0, 
^Px:0<G e (x)<l \N n(x) ~ K n (x)\ ^ 0. 

Proof: Follows by arguments similar to the proof of Corollary 2.6, p. 37, of Einmahl (1997) 

We are now ready to state the limiting distributions of our estimators. In Theorem 3 
below, by 'lim' we mean limit in distribution. 
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Theorem 3. Under A. 2, if nG(x n ) — > oo as n — > oo, then 
(a) A(x^) -> oo, where = (x n - b n )/a n ; 



(b) let 



then 



Z\ n — 



{Yr J=l i(Y 3 >x n )yi\ v uxn)-{i-p)) 



lim Z ln 

n^oo 

lim 



^ Vj -(l- p ) 



/y/p{l-p) = Normal (0,1), 



where (771, r) 2 , ■ ■ ■) are iid Bernoulli (1 — p) as in Lemma 1, Part (c); 



(c) let 



then 



J2n 



(E? = l^>^)) 1/2 (P2n(x»)-(l-p)) 

y/p{l-p) 



lim Z 2n 

n^oo 



lim a/ N(x' n ) sup 



JV(x) 1 P) 



/y/p( 1 ~P) = half-Normal (0, 1), 



where 'half-Normal' (0, 1) is the distribution of | Normal (0, 1)|. 



Proof: 

(a) Since extreme- value distributions are all continuous, the convergence \G n (a n x + b n ) — 
G (x)\ — > is uniform in x. Now nG(x n ) — > 00 =>- G n (x n ) = G n (a n x' n + 6 n ) — > 0, hence 
Go 0*4 ) — > 0. The result follows because A(xjj) = — logGo(a4). 

(b) Note that 

(E; =1 /(^>x n )) 1 /2( Pln ( Xn )_(i_ p )) 



lim 

n^oo 



= lim(7V ln «) + 7V nK)) 1/2 

n^oo 

= lim(^ n K) + ^ n «))^ 



(1-p) 
(1-p) 



/y/p(l-p) 



/y/p(l-p), 
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by Theorem 2. The result now follows using the representation in Lemma 1, Part (c), and 
the random central limit theorem, since (771,772, • • •) are iid Bernoulli (1 —p), independent of 
N(-), and further, by Part (a) above, A(x' n ) — > 00, N (x' n ) / A(x' n ) — > 1, as n — > 00. 
(c) This result too follows as in Part (b) above, by noting that 



lim y/N(x' n ) sup 

n ^°° x<x' 



jV(x) 



(1-P) 



/ Vp(1 —P) = l im SU P 

n->oo m>n 



Em 

m 



(1-P) 



Weak convergence of the sequence on right-hand-side to the half- Normal distribution is estab- 
lished in Robbins et al (1968) (see also Stute (1983) for a generalization to M-estimators). n 



Remark 1. Figures 4 and 5 give histograms of Z ln and Z 2n , respectively, based on 5000 
samples each. Either of Z\ n and Z2 n may easily be used to construct confidence intervals for 
(1 —p). However, note that limiting variance of Z ln = 1 > 1 — 2n~ 1 = limiting variance of 
Z 2n . Hence the latter may be a better choice. On the other hand, Figure-5 shows that the 
convergence of Z 2n to the half-Normal distribution is not very good. 
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Figure 5: histogram of studentized Z 2n '- V = 0.3, F = Exp (2), G = Exp (1), n — 100, based 
on 5000 samples 
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